Goto

Collaborating Authors

 best-action query



Online Learning with Sublinear Best-Action Queries

Neural Information Processing Systems

In online learning, a decision maker repeatedly selects one of a set of actions, with the goal of minimizing the overall loss incurred. Following the recent line of research on algorithms endowed with additional predictive features, we revisit this problem by allowing the decision maker to acquire additional information on the actions to be selected. In particular, we study the power of \emph{best-action queries}, which reveal beforehand the identity of the best action at a given time step. In practice, predictive features may be expensive, so we allow the decision maker to issue at most $k$ such queries.We establish tight bounds on the performance any algorithm can achieve when given access to $k$ best-action queries for different types of feedback models. In particular, we prove that in the full feedback model, $k$ queries are enough to achieve an optimal regret of $\Theta(\min\{\sqrt T, \frac{T}{k}\})$. This finding highlights the significant multiplicative advantage in the regret rate achievable with even a modest (sublinear) number $k \in \Omega(\sqrt{T})$ of queries. Additionally, we study the challenging setting in which the only available feedback is obtained during the time steps corresponding to the $k$ best-action queries.


Online Learning with Sublinear Best-Action Queries

Neural Information Processing Systems

In online learning, a decision maker repeatedly selects one of a set of actions, with the goal of minimizing the overall loss incurred.


Online Learning with Sublinear Best-Action Queries

Russo, Matteo, Celli, Andrea, Baldeschi, Riccardo Colini, Fusco, Federico, Haimovich, Daniel, Karamshuk, Dima, Leonardi, Stefano, Tax, Niek

arXiv.org Artificial Intelligence

Online learning is a foundational problem in machine learni ng. In its simplest version, a decision maker repeatedly interacts with a fixed set of n actions over a time horizon T . At each time, the decision maker needs to choose one of a set of actions; sub sequently, it receives an action-dependent loss and observes some feedback. These loss funct ions are generated by an omniscient (but oblivious) adversary and are only revealed on-the-go. The goal of the decision maker is to design a learning algorithm that achieves small regret with respect to the best fixed action in hindsight, i.e., the difference between the decision maker' s loss and that of the fixed action. Several online learning algorithms have been developed, character ized by optimal instance-independent regret bound, depending on the feedback model [ 13, 28 ]. Following the recent literature on algorithms with machine learning-based predictions (see, e.g., the survey by Mitzenmacher and Vassilvitskii [ 24 ]), we study the case where the learner is allowed to issue a limited number of best-action queries to an oracle that reveals the identity of the best action for that step, so that the learner can choose it. This s etting is motivated by scenarios in which obtaining accurate predictions on the optimal choice among numerous actions is possible but comes with significant costs and time constraints. For in stance, consider an online platform that continuously moderates posted content (e.g., Meta [ 22, 23 ] or Google [ 17 ]), and the online learning problem it faces: posts are generated one after the other, and the platform's task consists